Skip to content

Conversation

@YunchuWang
Copy link
Member

@YunchuWang YunchuWang commented Aug 13, 2025

This pull request introduces support for externalized payload storage using Azure Blob Storage, enabling large payloads to be stored out-of-band and referenced via tokens. The changes add a new abstraction for payload storage, implement an Azure Blob-based store, and provide integration points for both client and worker dependency injection to enable this feature. Additionally, a new data converter is introduced to automatically externalize large payloads based on configurable thresholds.

Externalized Payload Storage Infrastructure:

  • Added the IPayloadStore interface to abstract storing and retrieving large payloads out-of-band, with async upload and download methods.
  • Implemented BlobPayloadStore, an IPayloadStore that stores payloads as compressed blobs in Azure Blob Storage and returns opaque tokens for retrieval.
  • Introduced LargePayloadStorageOptions to configure externalized storage (enable/disable, threshold, connection string, container name).

Data Conversion Enhancements:

  • Added LargePayloadDataConverter, a DataConverter that wraps another converter and externalizes payloads exceeding a configurable size threshold, using the configured IPayloadStore.
  • Updated the base DataConverter class to support external storage by adding the UsesExternalStorage property and clarifying documentation.

Dependency Injection & Configuration:

  • Added UseExternalizedPayloads extension methods to both IDurableTaskClientBuilder and IDurableTaskWorkerBuilder to register the required services and wrap the configured data converter for externalized payload support. [1] [2]
  • Registered new dependencies and package references for Azure Blob Storage and required system annotations. [1] [2] [3]

These changes collectively enable seamless, configurable support for handling large payloads in Durable Task workflows by storing them externally and referencing them within orchestration messages.

@YunchuWang
Copy link
Member Author

sample validated
info: Microsoft.DurableTask[1]
Durable Task gRPC worker starting and connecting to uksautop-etbbheebdcgm.uksouth.durabletask.io.
info: Microsoft.Hosting.Lifetime[0]
Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
Content root path: D:\projects\durabletask-dotnet\samples\LargePayloadConsoleApp
info: Microsoft.DurableTask.Client.Grpc.GrpcDurableTaskClient[40]
Scheduling new LargeInputEcho orchestration with instance ID '3745aa067f8a4a3bb48c40b8b4e8093a' and 71 bytes of input data.
info: Microsoft.DurableTask[4]
Sidecar work-item streaming connection established.
Started orchestration with direct large input. Instance: 3745aa067f8a4a3bb48c40b8b4e8093a
info: Microsoft.DurableTask.Client.Grpc.GrpcDurableTaskClient[43]
Waiting for instance '3745aa067f8a4a3bb48c40b8b4e8093a' to complete, fail, or terminate.
info: Microsoft.DurableTask.Worker.Orchestrations[600]
'LargeInputEcho' orchestration with ID '3745aa067f8a4a3bb48c40b8b4e8093a' started.
info: Microsoft.DurableTask.Worker.Activities[603]
'Echo' activity of orchestration ID '3745aa067f8a4a3bb48c40b8b4e8093a' started.
info: Microsoft.DurableTask.Worker.Activities[604]
'Echo' activity of orchestration ID '3745aa067f8a4a3bb48c40b8b4e8093a' completed.
'Echo' activity of orchestration ID '3745aa067f8a4a3bb48c40b8b4e8093a' completed.
'Echo' activity of orchestration ID '3745aa067f8a4a3bb48c40b8b4e8093a' completed.
'Echo' activity of orchestration ID '3745aa067f8a4a3bb48c40b8b4e8093a' completed.
'Echo' activity of orchestration ID '3745aa067f8a4a3bb48c40b8b4e8093a' completed.
'Echo' activity of orchestration ID '3745aa067f8a4a3bb48c40b8b4e8093a' completed.
'Echo' activity of orchestration ID '3745aa067f8a4a3bb48c40b8b4e8093a' completed.
'Echo' activity of orchestration ID '3745aa067f8a4a3bb48c40b8b4e8093a' completed.
'Echo' activity of orchestration ID '3745aa067f8a4a3bb48c40b8b4e8093a' completed.
'Echo' activity of orchestration ID '3745aa067f8a4a3bb48c40b8b4e8093a' completed.
'Echo' activity of orchestration ID '3745aa067f8a4a3bb48c40b8b4e8093a' completed.
'Echo' activity of orchestration ID '3745aa067f8a4a3bb48c40b8b4e8093a' completed.
info: Microsoft.DurableTask.Worker.Orchestrations[601]
'LargeInputEcho' orchestration with ID '3745aa067f8a4a3bb48c40b8b4e8093a' completed.
RuntimeStatus: Completed
UsesExternalStorage (result converter): True
SerializedInput: dts:v1:payloadtest:2025/08/13/20/16/51/e3afe72b95ce49f493a34bd32e1b1b52
SerializedOutput: dts:v1:payloadtest:2025/08/13/20/17/02/16290bdc18f94a5caeb9e37ee5afc517
Deserialized input equals original: True
Deserialized output equals original: True
Deserialized input length: 1048576

@YunchuWang YunchuWang marked this pull request as ready for review August 13, 2025 20:20
@YunchuWang YunchuWang requested a review from cgillum August 13, 2025 20:59
@YunchuWang
Copy link
Member Author

TODO: when load test, getting 503 busy periodically. should add retry when upload/download from blob
This orchestration failed with the following error:

Microsoft.DurableTask.TaskFailedException: Task 'SayHello' (#8) failed with an unhandled exception: Ingress is over the account limit. RequestId:7f562149-b01e-0044-3d2b-116c5b000000 Time:2025-08-19T17:03:34.3967120Z Status: 503 (Ingress is over the account limit.) ErrorCode: ServerBusy Content: ServerBusyIngress is over the account limit. RequestId:7f562149-b01e-0044-3d2b-116c5b000000 Time:2025-08-19T17:03:34.3967120Z Headers: Server: Microsoft-HTTPAPI/2.0 x-ms-request-id: 7f562149-b01e-0044-3d2b-116c5b000000 x-ms-client-request-id: 6583fc0c-16ac-4d93-baa7-f7fb9bc1f3cb x-ms-error-code: ServerBusy Date: Tue, 19 Aug 2025 17:03:34 GMT Content-Length: 213 Content-Type: application/xml

Stack trace:

at Microsoft.DurableTask.Worker.Shims.TaskOrchestrationContextWrapper.CallActivityAsync[T](TaskName name, Object input, TaskOptions options)
at AspNetWebApp.Scenarios.HelloActivities.RunAsync(TaskOrchestrationContext context, ActivityPayload payload) in /src/Orchestrations/HelloActivities.cs:line 18
at Microsoft.DurableTask.TaskOrchestrator`2.Microsoft.DurableTask.ITaskOrchestrator.RunAsync(TaskOrchestrationContext context, Object input)
at Microsoft.DurableTask.Worker.Shims.TaskOrchestrationShim.Execute(OrchestrationContext innerContext, String rawInput)

@YunchuWang
Copy link
Member Author

image image perf test, 8mb, 100 rps, avg orch latency 30s

@YunchuWang
Copy link
Member Author

image image

@YunchuWang
Copy link
Member Author

image image

@YunchuWang
Copy link
Member Author

image tested scheduler backend

Copy link
Member

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great feature, and a nice use of the DataConverter abstraction. Here's my first round of feedback.


namespace Microsoft.DurableTask.Grpc.Tests;

public class LargePayloadTests(ITestOutputHelper output, GrpcSidecarFixture sidecarFixture) : IntegrationTestBase(output, sidecarFixture)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need tests for entities as well. Specifically, entity operations and entity state needs to be considered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see samples, but I don't see automated tests for entities. Did I miss it?

Copy link
Member Author

@YunchuWang YunchuWang Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for confusion @cgillum i answered this in other comments. i found the current testing backend setup does not support entity, so i added it in the sample and validated it e2e. the entity sample e2e result looks correct. can we use sample for verification and later add this to dts e2e tests which more closely simulate the production experience?
image

image

@YunchuWang YunchuWang requested a review from nytian September 1, 2025 16:47

// Upload synchronously in this context by blocking on async. SDK call sites already run on threadpool.
byte[] bytes = this.utf8.GetBytes(json);
string token = this.payLoadStore.UploadAsync(bytes, CancellationToken.None).GetAwaiter().GetResult();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very problematic for us to call async code from a non-async abstraction. It can create very bad performance and availability problems for the customer, like thread starvation. We should look into supporting async serialize and deserialize methods from the DataConverter abstraction.


namespace Microsoft.DurableTask.Grpc.Tests;

public class LargePayloadTests(ITestOutputHelper output, GrpcSidecarFixture sidecarFixture) : IntegrationTestBase(output, sidecarFixture)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see samples, but I don't see automated tests for entities. Did I miss it?

@YunchuWang
Copy link
Member Author

abandoning this PR and move this feature change to another pr #468

@YunchuWang YunchuWang closed this Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants